A simple try assisted with GPT : nonblocking gather/scatter exchanges#7396
Open
mystic-qaq wants to merge 9 commits into
Open
A simple try assisted with GPT : nonblocking gather/scatter exchanges#7396mystic-qaq wants to merge 9 commits into
mystic-qaq wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Refactors PW_Basis::gatherp_scatters and PW_Basis::gathers_scatterp from blocking MPI_Alltoallv to non-blocking MPI_Irecv/MPI_Isend exchanges with overlapping pack/unpack work, adds a per-instance reusable communication workspace, and introduces a round-trip unit test.
Changes:
- Replace
MPI_Alltoallvwith non-blocking sends/receives plusMPI_Waitsome-driven unpack overlap in both gather/scatter directions, separating send/receive into distinct workspace slices. - Add
acquire_comm_workbuf<T>()returning per-instancemutablestd::vectorstorage (float and double specializations) and add fine-grained timer regions. - Add
test_comm_roundtrip.cpp(round-trip equality and a zero-plane "stress" layout sweep) and register it in the testCMakeLists.txt.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| source/source_basis/module_pw/pw_gatherscatter.h | Rewrites both routines to use Irecv/Isend with manual self-copy, dedicated send/recv workspace, and overlapped unpack via MPI_Waitsome. |
| source/source_basis/module_pw/pw_basis.h | Declares acquire_comm_workbuf plus mutable per-instance buffers; adds <vector> include. |
| source/source_basis/module_pw/test/test_comm_roundtrip.cpp | New round-trip tests using a friend accessor subclass to call the protected gather/scatter methods. |
| source/source_basis/module_pw/test/CMakeLists.txt | Registers the new test source file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mohanchen
reviewed
May 30, 2026
| std::string precision = "double"; ///< single, double, mixing | ||
| bool double_data_ = true; ///< if has double data | ||
| bool float_data_ = false; ///< if has float data | ||
| mutable std::vector<std::complex<float>> comm_workbuf_float_; |
Collaborator
There was a problem hiding this comment.
Not recommended to use mutable keyword. It breaks const semantics, hides state changes
and brings potential thread-safety risks. Use it only as a last resort.
…rk buffers Remove the mutable keyword from comm_workbuf_float_ and comm_workbuf_double_ by switching from std::vector (which returns const T* from const data()) to std::unique_ptr<T[]> (whose get() returns T* from const method). Key changes: - Pre-allocate work buffers in allocate_comm_buffers() called from getstartgr(), using the already-computed numr/startr/numg/startg arrays to determine the maximum required buffer size - acquire_comm_workbuf<T>() no longer resizes lazily; it returns the pre-allocated buffer via unique_ptr::get() with an assertion guard - Add cleanup in destructor via unique_ptr::reset() Rationale: unique_ptr::get() is a const method that returns a non-const T*, matching the semantic intent — a const PW_Basis does not re-seat the buffer pointer, but the pointed-to scratch memory remains mutable for MPI write operations. This avoids the thread-safety concerns of mutable while maintaining const-correctness throughout the gather/scatter call chain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Include standalone microbenchmark (bench_comm.cpp) comparing blocking vs nonblocking MPI gather/scatter, and PR_DESCRIPTION.md with design rationale and performance validation results. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the simplified microbenchmark with a benchmark that directly calls PW_Basis::gatherp_scatters()/gathers_scatterp() (feat/unblock) and compares against the exact blocking implementations from the develop branch. Uses realistic ABACUS parameters (10A cell, ecut=100Ry, 64^3 FFT grid). Key results: nonblocking is 1.06x-1.45x faster at 3+ MPI ranks, with maximum speedup of 1.45x at 4 ranks with 2 OpenMP threads. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR.md